AITopics | rl approach

Collaborating Authors

rl approach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Neural Information Processing SystemsMar-22-2026, 02:36:31 GMT

Training offline RL models using visual inputs poses two significant challenges,, the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the " " for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Auxiliary Learning (AL) is a form of multi-task learning in which a model trains on auxiliary tasks to boost performance on a primary objective. While AL has improved generalization across domains such as navigation, image classification, and NLP, it often depends on human-labeled auxiliary tasks that are costly to design and require domain expertise. Meta-learning approaches mitigate this by learning to generate auxiliary tasks, but typically rely on gradient based bi-level optimization, adding substantial computational and implementation overhead. We propose RL-AUX, a reinforcement-learning (RL) framework that dynamically creates auxiliary tasks by assigning auxiliary labels to each training example, rewarding the agent whenever its selections improve the performance on the primary task. We also explore learning per-example weights for the auxiliary loss. On CIFAR-100 grouped into 20 superclasses, our RL method outperforms human-labeled auxiliary tasks and matches the performance of a prominent bi-level optimization baseline. We present similarly strong results on other classification datasets. These results suggest RL is a viable path to generating effective auxiliary tasks.

auxiliary task, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.2294

Genre: Research Report (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

DARIL: When Imitation Learning outperforms Reinforcement Learning in Surgical Action Planning

Boels, Maxence, Robertshaw, Harry, Booth, Thomas C, Dasgupta, Prokar, Granados, Alejandro, Ourselin, Sebastien

arXiv.org Artificial IntelligenceOct-21-2025

Surgical action planning requires predicting future instrument-verb-target triplets for real-time assistance. While teleoperated robotic surgery provides natural expert demonstrations for imitation learning (IL), reinforcement learning (RL) could potentially discover superior strategies through self-exploration. We present the first comprehensive comparison of IL versus RL for surgical action planning on CholecT50. Our Dual-task Autoregressive Imitation Learning (DARIL) baseline achieves 34.6% action triplet recognition mAP and 33.6% next frame prediction mAP with smooth planning degradation to 29.2% at 10-second horizons. We evaluated three RL variants: world model-based RL, direct video RL, and inverse RL enhancement. Surprisingly, all RL approaches underperformed DARIL--world model RL dropped to 3.1% mAP at 10s while direct video RL achieved only 15.9%. Our analysis reveals that distribution matching on expert-annotated test sets systematically favors IL over potentially valid RL policies that differ from training demonstrations. This challenges assumptions about RL superiority in sequential decision making and provides crucial insights for surgical AI development.

machine learning, prediction, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2507.05011

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Surgery (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

532435c44bec236b471a47a88d63513d-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 17:53:06 GMT

abstract level, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

274e6fcf4a583de4a81c6376f17673e7-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 12:26:47 GMT

artificial intelligence, descriptive feedback, descriptive setup, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.72)

Add feedback

fdd5b16fc8134339089ef25b3cf0e588-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 10:43:08 GMT

artificial intelligence, machine learning, neural information processing system, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)

Add feedback

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 12:58:09 GMT

Training offline RL models using visual inputs poses two significant challenges, i.e., the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.

collaborative world model, offline rl online, offline visual reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)

Add feedback

Review for NeurIPS paper: Multi-agent active perception with prediction rewards

Neural Information Processing SystemsJan-27-2025, 01:00:45 GMT

Weaknesses: The paper is well written and easy to follow. The problem is active perception is also interesting. There are a few areas where more clarification is needed as pointed below: -- The authors have highlighted a number of previous models for the problem of active perception such as Dec-\rhoPOMDP, POMDP-IR etc. Given the focus on converting this problem to a decentralized framework, it is not clearly conveyed why decentralizing the problem is significant? There are hints available in the paper such as less communication overhead, but there is no empirical evidence presented towards justifying decentralized approaches over such previous approaches (e.g., how much communication overhead is reduced) -- The technical approach presented by the authors is elegant and simple, but it is essentially a heuristic approach. The bound provided in theorem 1 would seem to be loose in the worst case (and its values in experiments is not shown).

multi-agent active perception, neurips paper, prediction reward, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)

Add feedback